Blip Image Captioning Large
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling in image caption generation and understanding tasks, efficiently utilizing web data through guided annotation strategies
Image-to-Text
Transformers